SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment
نویسندگان
چکیده
Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment information.
منابع مشابه
gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملSubstitution Matrices of Residue Triplets Derived from Protein Blocks
In protein sequence alignment, residue similarity is usually evaluated by substitution matrix, which scores all possible exchanges of one amino acid with another. Several matrices are widely used in sequence alignment, including PAM matrices derived from homologous sequence and BLOSUM matrices derived from aligned segments of BLOCKS. However, most matrices have not addressed the high-order resi...
متن کاملPairwise alignment incorporating dipeptide covariation
MOTIVATION Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assum...
متن کاملFinding Patterns in Protein Sequence and Structure
[wk 44] Wed 2 Nov 11 Lec 2: Pairwise alignment II JH Asn 1: Pairwise Alignment [wk 45] Mon 7 Nov 11 Lec 3: Substitution matrices SA, PB Questions hour (QH) [wk 45] Wed 9 Nov 11 Lec 4: Multiple Sequence JH QH Alignment I [wk 46] Mon 14 Nov 11 Lec 5: Multiple Sequence JH QH Alignment II [wk 46] Wed 16 Nov 11Lec 6: Markov models JH Asn 2: HMM; Deadline Asn 1 [wk 47] Mon 21 Nov 11 Lec 7: Hidden Mar...
متن کاملSignificant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments.
The relative performances of four strategies for aligning a large number of protein sequences were assessed by referring to corresponding structural alignments of 54 independent families. Multiple sequence alignment of a family was constructed by a given method from the sequences of known structures and their homologues, and the subset consisting of the sequences of known structures was extract...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2017